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ON    THE   GENERALIZED    FREQUENCY    FUNCTIONS    OF  ECGSHORTH. 

I.  INTRODUCTION. 

While  many  distributions  follow  the  normal  law,  if  allowance  is 
made  for  deviations  due  to  random  sampling,  there  are  well  known  classes 
of  variates  which  dc  not  follow  this  law.  The  distribution    of  statures 
of  certain  classes  of  men  fits  the  normal  curve.  It  is  only  reasonable  to 
expect  that  if  linear  measurements  follow  the  normal  law,  the  correspond- 
ing similar  surfaces  and  volumes  should  be  distributed  in  accord  with 
some  transformation  of  that  law. 

Let  x±9         x9,  xD  be  variates  of  a  distribution  where 

If  X%9  X^y  /fs,  •►•»  Xn  are  a  new  system  of  variates  where 

X \~~  hx^t         X,q_  ~  ftx,s,        ••*.,  /Yn  =  feXjj » 
the  type  of  the  curve  is  not  changed,  as  this  transformation  alters  only 
the  scale  or  the  size  of  the  modulus. 

If,  however,  a  function,  other  than  a  linear  function  of  the  vari- 
ates is  substituted  for  each  variate,  the  form  of  the  curve  will  be 
changed.  For  example,  if  we  make  the  transformations 

X  ^  —  ft**,    X,+  —  lix%y     X ■r,~  kx%,  ..«..»,    X p  —  ft*1^, 

and         X.  2  kx\,    T,9  =  kx%,  Xm  *  kx%,   ,    X   =  ft*3, 

the  transformed  frequency  distributions  may  be  regarded  as  distributions 
of  surfaces  and  volumes.  It  is  but  natural  to  assume  that  certain  distri- 
butions are  in  the  nature  of  similar  surfaces  and  volumes    whose  linear 
dimensions  follow  the  normal  law.  In  more  general  terms,  it  is  but  natural 
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to  expect  that  certain  observed  variates  are  functions  of  more  fundamen- 
tal elements  where  those  fundamental  elements  are  normally  distributed. 

This  conception  of  transformation  of  variates  is  fundamental  in 
Edgeworth's  "Method  of  Translation",  and  in  certain  theories  which  he  has 
advanced  concerning  generalized  frequency  curves** 

It  is  a  purpose  of  this  paper  to  test  the  practicability  of  regarding 
some  observed  distributions  as  transformations  of  more  elementary  vari- 
ates which  follow  the  normal  curve  and  to  investigate  the  characters  of 
some  functions  that  are  obtained  by  simple  transformations  of  a  normal 
distribution. 

More  precisely,  if  xx,  x.Z9  xn  are  a  set  of  elementary  variates 

that  are  normally  distributed,  new  variates  Xx>  X2>  . ..,  Xn  may  be  formed 
which  are  functions  of  these  more  elementary  variates,  say  of  the  form 

Xv  =  a%xr  +  or,**  +   +  anxj  *  ....  (1). 

The  JTs  would  of  necessity  follow  some  law  of  frequency.  It  may  be  of  in- 
terest to  determine  the  law  for  certain  cases.  The  cases  in  which  only  a 
few  terms  of  series  (1)  need  be  used    are  perhaps  of  the  most  interest. 
If  ae,  a,,  an,  ...  are  small  as  compared  with  a    and  we  let  K= 

X  -  --;  we  may  write    r    =  a,(*    +  x**  +  hx*  +  .,..). 

a  t  r  r  r  r 

It  is  not  usually  necessary  to  go  beyond  the  third  power  in  x  provi- 
ded x  and  \  are  small  decreasing  numbers  as  they  are  in  the  numerical 
illustration  of  section  V.  Or  conversely,  having  an  observed  distribution 
given,  we  shall  inquire  into  the  consequences  of  treating  it  as  a  distrib- 
ution which  is  formed  by  transformation  from  a  certain  fundamental  normal 

aiskcibutifiQ*-^ 

*  Fifth  International  Congress  of  Mathematicians,   vol.    II,    page  428, 
1912.   Journal  of  the  Royal  Statistical  Society,  1S98-1900. 
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II.  TRANSFORMATION  X  =  x*. 

This  is  a  very  special  type  of  transformation    which  surfaces  might 
be  expected  to  follow,  since  similar  surfaces  are  to  each  other  as  the 
squares  of  their  linear  dimensions. 

Let  f(X)dX  be  this  distribution  formed  by  squaring  each  variable  of 
a  system  that  follows  the  normal  curve  of  standard  deviation  o  and  center 
of  gravity  at  x=a.  Then  the  integral  _ 

p <r*i .  ~|j/e  2*  dx 

gives  the  probability  that  a  variable   xof  the  generating  system  falls 

between  assigned  limits  of  integration.  Since  X  =  x*r  dX=&dx ,  and  dx 
Substituting       for  x  and        for  dx  in  the  above  equation, 

fr(x)dX  =  ffefe  2^.^^ 


aszn  2Jx 
Since  each  element  of  the  original  group  is  squared  to  form  the  trans- 
formed distribution , each  negative  element  of  the  normal  distribution  will 
become  positive  when  squared.  For  this  reason, the  area  under  the  normal 
curve  is  taken  between  the  limits, say  from  -ot  to  +a,  where  a  is  any  posi- 
tive number,  and  the  area  under  the  transformed  curve  is  taken  from  0  to 
ttf8.  By  hypothesis  ,o  is  any  positive  number,  and  we  may  extend  the  limits 
to  infinity  for  convenience  of  integration.  Then 


1  -tdX  =  -i-f.ee    so-  dX9 


2CT 


As  the  unknown  origin  0  is  at  a  distance  a  from  the  center  of  gravi- 
ty of  the  normal  curve,  a'2  is  the  distance  from  the  median  of  the  trans- 
formed curve  to  the  same  origin  O,  provided  a  is  large  enough  so  that 
none  of  the  variates  of  the  normal  curve  are  below  the  origin. 

Let  g  be  the  distance  from  the  origin  to  the  mean  of  the  observed 
curve  and  let  o/2  be  the  modulus  of  the  normal  curve. 
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By  method  of  moments  we  shall  determine  the  three  constants.  Since 
each  variate  x  of  the  normal  curve  is  replaced  by  x12,  we  have 


=  -4*/l*2ne    zr*  dx 
a/2n  ' 

from  which  we  see  that  we  can  use  our  knowledge  of  the  normal  curve  in 


valuing  the  moments. 
First  moment. 


( x-a) g 


dx 

2 


=  ^/-(x'+ol'e  ^*dx* 


(1). 


when  we  put  x  -  x1  +  a.  Integrating  by  parts 

g  =  ct*  +  cr2  ..... 
The  mean,  therefore,  is  equal  to  the  median  plus  a2. 

Second  moment.   The  second  moment  coefficient  about  the  center  of 
gravity  being  denoted  byjij,,  that  about  the  origin  is, 

£2  +  u,T  =  -Wt«(*+a)4  &2dx. 
<jv2n 

Integrating  by  parts  gives    |2  +  u.;2  =  a4  +  8a2cr2  +  3?4   (2) 

Third  moment*  The  third    moment  coefficient  about  the  center  of 
gravity  is  ji,,,  that  about  the  origin  ia, 

gs  +  3gn.9  +  u.n  =  -4«=/_."(x+a)«e  ^dx, 

=  a*  +  15a4a2  *  45a2a4  +  loj6..,..  (3) 
By  substituting  the  value  of  g  from  equation  (1)  into  (2)  and  (3),  we  get 


BU  =  4a2cj2  +  2a 


=  24a2a4  +  8a" 
Eliminating  a  from  (4)  and  (5) 

4c*  -  6*V.a  *  ^3  =  0. 


(4) 
(5) 


Substituting  f  for  a2 

4f3  -  6f-|x;,  +  H.„  =  0  (6) 
This  cubic  is  easily  solvable  by  Horner Ts  method  and  the  other  constants 
may  be  determined  from  equations  (4)  and  (5). 

Geometric  meaning  of  this  transformation.  One  peculiarity  of  this 
type  transformation  is  the  infinite  discontinuity  at  the  origin.  No  mat- 
ter what  the  modulus  is  or  how  far  the  center  of  gravity  of  the  curve  is 
removed  from  the  origin,  there  is  always,  at  least,  one  point  of  infinite 

discontinuity  at  the  origin.  CeZ-al2 

1  8 

If  we  allow  <j=l  and  differentiate  -±e  we  shall  be  able  to 

✓Z 

find  the  maxima,  minima  and  points  of  inflection  of  the  theoretical  curve, 
*  rie       2    *(v#-a)-4=  +  e       2    •      I      «(),(✓? -a)  -4  =  0, 


X  2 

We  note  that  for  a<2  there  is  no  maximum,  minimum  or  ooint  of  inflection. 
For  a-2  there  is  a  point  of  inflection  at  f*l,  but  their  is  neither  maxi- 
mum nor  minimum.  For  a>2  we  have  one  maximum  and  one  minimum.  For  a=4  , 
1=13.92  or  0.07. 

This  indicates  that  it  is  necessary  to  have  the  origin  back  a  dis- 
tance at  least  4a  in  the  negative  direction  from  the  origin  in  order  that 
all  the  end  values  of  the  normal  curve  may  be  included  in  the  positive 
field..  If  the  origin  is  not  back  far  enough,  the  negative  end,  on  being 
squared,  will  distort  the  distribution  in  a  most  surprising  way,  as  will 
be  seen  presently. 

Figure  1  is  the  normal  curve.  In  case  a=a  there  is  not  the  slightest 
resemblance  between  the  elementary  normal  curve  and  the  transformed  curve 
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(fig.  6).  When  x=0,  t/=  a>.  Prom  this  point  there  is  a  continuous  ourve 
without  maximum,  minimum  or  point  of  inflection. 

When  a=»2cr  the  curve  is  hardly  recognizable  as  related  to  the  normal 
curve,  as  *shown  in  figure  5,  there  is  neither  maximum  nor  minimum.  At  jc=1, 
there  is  a  point  of  inflection  with  the  inflectional  tangent  parallel  to 
the  Jt-axis. 

Figure  7  illustrates  the  distortion  for  a=3a.  In  this  case  the  max- 
imum is  at  6.39  and  the  minimum  at  0.11.  Even  here  the  origin  is  not  back 
far  enough. 

If  a-4a  the  curve  presents  the  general  appearance  of  a  much  flattened 
normal  curve  as  shown  in  figure  8. 

It  is  therefore  necessary,,  if  we  are  to  expect  the  transformed  curve 
to  be  similar  in  general  appearance  to  the  fundamental  normal  curve,  that 
the  centroid  of  the  normal  curve  must  be  at  a  distance  of  at  least  4cr 
above  the  origin,  otherwise  our  transformed  curve  will  be  utterly  differ- 
ent in  general  appearance  from  the  normal  curve. 

It  might  well  be  expected  that  the  weights  of  5082  men  who  are  five 
feet,  nine  inches  tall  and  are  between  the  ages  of  twenty  and  twenty-four 
years  of  age  would  be  distributed  according  to  some  transformed  curve. 
I  have  accordingly  selected  an  illustration  from  the  Medico-Actuarial 
Mortality  Investigation,  vol.  I,  page  41. 

Shepnard's  corrections  were  applied  to  the  moments  calculated  about 
the  centroid  because  of  high  contact  at  the  ends  of  the  range. 

If  fii,  ,u2,         fji4  be  written  for  the  adjusted  moments,  it  was  found 
that  iii  -  0 

u?  »  190.08565 
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H„  =  2203.6126 
H4  =  174,557.2162 

The  mean  is  at  150.096418.  The  calculated  theoretical  normal  frequency 
distribution  is  given  in  Table  I. 

The  class  mark  of  each  frequency  group  is  given  in  column  I.  The 
difference  between  the  centroid  and  the  class  mark  X  is  given  in  column 
II.  Column  II, divided  by  the  standard  deviation  cr  ,is  given  in  column  III. 
The  fractional  area  under  the  curve  between  0  and         as  it  is  tabulated 
in  books  on  Probability*,  is  in  column  IV*  The  entire  area  is  given  in 
column  V,  the  calculated  frequency  in  column  VI,  the  observed  frequency 
in  column  VII,  and  the  residuals  in  column  VIII. 

TABLE  I. 


I. 

II. 

III. 

IV. 

V. 

VI. 

VII." 

VIII 

97.5 

52&64 

3.31437 

0.49993 

2541 

1 

1 

0 

102.5 

47.5964 

3.45222 

0.49972 

2o«s9 . 

5  3 

... 

3 

107.5 

42.5964 

3.03956 

0.49928 

2537 

12 

1 

11 

112.5 

37.5964 

2.72691 

0.49683 

2525 

30 

2 

28 

117.5 

32.5964 

2.36425 

0.49091 

2495 

69 

26 

43 

122.5 

27.5964 

2.0016 

0.47728 

2426 

142 

94 

48 

127.5 

22.5964 

1.63394 

0.44384 

2284 

256 

248 

8 

132.5 

17.5964 

1.27628 

0.39906 

2028 

404 

483 

84 

137.5 

12.5964 

0.91363 

0.31954 

1624 

561 

724 

163 

142.5 

7.5964 

0.55097 

0.20916 

1063 

683 

746 

63 

147.5 
152.5 

2.5964 
2.4036 

0.18832 
0.17434 

0.07468 
0.06919 

380 
35 1_ 

731 

817 

86 

157.5 

7.4036 

0.53691 

0.20432 

1039 

678 

800 

78 

162.5 

12.4036 

0.89964 

0.31584 

1605 

566 

513 

53 

167.5 

17.4036 

1.26230 

0.39657 

2015 

410 

315 

95 

Davenport,   Statistical  Methods,   p.  119. 
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I. 

II. 

III. 

IV. 

V. 

VI. 

VII. 

VIII. 

1  70 

0077 

<jG  c 

ona 

1 77 

07  Afl3ft 

iC)t£><t 

j.'J«j 

10r! 
J.UO 

47 

'it 

1  SO  n 

as  4.028 

2^35027 

0-4906? 

2493 

71 

75 

A 
'i 

1R7  5 

97  A03« 

2.71292 

0.49663 

2524 

31 

28 

7 
/ 

19?  5 

3.07553 

0.49395 

2536 

12 

28 

16 

197.5 

47.4038 

3.43328 

0.49971 

2539.5 

4 

18 

14 

202, 5 

52.4036 

3.80089 

0.49993 

2540.6 

1 

17 

16 

212.5 

57.4038 

4.1635 

0.5 

2541 

.4 

5 

4.6 

217.5 

82.4036 

Beyond  the  tables. 

4 

4. 

222.5 

2 

2. 

227.5 

1 

1 

232.5 

2 

2 

237.5 

. . 

0 

242.5 

0 

247.5 

1 

1 

0 

The  normal  function  does  not  describe  this  frequency  distribution 
at  all  ,  as  is  obvious  from  the  above  table. 

Before  calculating  the  theoretical  distribution  in  case  of  squares 
it  is  necessary  to  determine  the  values  of  the  constants  a  and  j.  Sub- 
stituting the  values  of  u,P  and  ^  into  equation  (8) 

4f3  -  6x190. 03o65f  +  2203.6126  =  0. 
It  is  easily  shown  that  this  equation  has  three  fceal  roots  of  which  two 
are  positive  and  one  is  negative.  The  negative  value  would  make  a2  nega- 
tive, the  larger  positive  value  would  make  a  smaller  than  or.  A  distribu- 
tion of  this  sort  is  shown  in  fig. f I.  The  central  value  is^theref ore,the 
appropriate  one. 

f  =  1.9535. 


0 

Since  f  =  <j2  u  =  t  1.3937. 

Substituting  the  value  of  cr  in  equation  (4) 

a2  =  23.74563, 

a    =  +  4.3759. 

With  a  no  larger  as  compared  to  cr  we  could  hardly  expect  an  excellent  fit 

In  table  II,  an  attempt  was  made  to  fit  this  observed  distribution 
by  means  of  a  function  where  X-x2t  x's  being  the  variates  of  the  theoret- 
ical normal  distribution.  Column  I  contains  the  class  marks;  col.  II,  the 
distance  from  the  origin  to  the  median  of  the  transformed  curve  plus  the 

deviation  from  that  median,  a2+Y-tf;  col.  Ill,  ^liilzlzl  =  £;  col.  IV,  the 

cr  cr 

fractional  area  between  0  and  **;  col.  V,  fractional  area  times  the  num- 
ber of  frequencies;  col.  VI,  the  calculated  frequencies;  col.  VII,  the 
observed  frequencies,  and  col.  VIII  the  residuals. 

TABLE  II. 
Above  the  median. 


I. 

II. 

III. 

IV. 

V. 

VI. 

VII. 

VIII. 

152.5 

23.1367 

0.30635 

0.12032 

611.4 

717. 

817. 

100. 

157.5 

33.1367 

0.62591 

0.23431 

1191. 

580. 

600. 

20. 

162.5 

38.1367 

0.92912 

0.32357 

1644. 

453. 

513. 

60. 

167.5 

43.1367 

1.20967 

0.38365 

1943. 

304. 

315. 

11. 

172.5 

43.1367 

1.4743 

0.42979 

2184. 

236. 

208. 

28. 

177.5 

53.1367 

1.7256 

0.45778 

2326. 

142. 

108. 

34.. 

132.5 

58.1367 

1.98529 

0.47530 

2415. 

39. 

75. 

14. 

137.5 

63.1367 

2.19482 

0.43591 

2469. 

54. 

38. 

16. 

192.5 

68.1367 

2.4157 

0.49214 

2501. 

32. 

23. 

4. 

197.5 

73.1367 

2.62315 

0.49569 

2519. 

18. 

18. 

0. 

202.5 

73.1367 

2.3337 

0.49789 

2529. 

10. 

17. 

7. 

Davenport,    Statistical  methods,    pages  119-125. 
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I. 

II. 

HI. 

IV. 

V.  VI. 

VII. 

VIII. 

207.5 

33.1367 

3.03274 

0.49380 

2535.  6. 

5. 

U 

212.5 

88.1367 

3.22592 

0.49937 

2537.  2. 

4. 

2. 

217.5 

93.1367 

3.41355 

0.49963 

2539.  2. 

2. 

0. 

222.5 

98.1367 

3.59648 

0.49933 

2540.  U 

1. 

0. 

227.5 

103.1367 

3.7749 

0.49992 

2540.5  0.5 

2. 

1.5 

232. 5 

108.1367 

3.94702 

0.49996 

2540.7  0.2 

. . 

0.2 

927  R 

112  1287 

4.11732 

Bevond 

the  tables. 

0. 

1. 

u 

Below  the 

median. 

147.5 

23.1367 

0.04704 

0.01875 

96. 

142.5 

18.1367 

0.44126 

0.17043 

368.  770. 

746. 

24. 

137.5 

13.1367 

0.96625 

0.33302 

1692.  326. 

724. 

102. 

132.5 

8.1367 

1.4467 

0.425593 

2163.  471. 

483. 

17. 

127.5 

3.1367 

2.21934 

0.43679 

2474.  311. 

248. 

63. 

122.5 

-1.8633 

imaginary 

• 

94. 

94. 

26. 

26. 

2. 

2. 

1. 

U 

0. 

0. 

1. 

1. 

This  transformation  describes  the 

upper  end  of  the 

distribution 

fairly  well,  toward  the  center  the  transformed  curve  is 

too  low  and  below 

the  median  there 

* 

is  no  fit, 

the  values 

becoming  imaginary  afte 

r  the 

fourth.  The  appearance  of  the 

imaginary 

variates  at  the 

lower  end  of  the 

distribution  shows  that  the 

origin  is 

not  back  far  enough,  in 

other  words. 

that  part  of  the 

variates  of 

the  elementary  normal  distribution  fall  in 

the  negative  region.  After  squaring  these  variates  they  become 

positive 

and  change  the  form  of  the  curve. 

11. 

III. 

rRANSFORMATION  X  =  x3. 

In  ca3e  of  similar  volumes  it  seems  quite  reasonable  that  we  might 

have  a  distribution  x  -  x3 

,  where  each  variate  of  the  normal  distribution 

is  cubed.  In  case  the  variates  distribute  themselves  in  this  fashion,  the 

constants  may  be  determined  by  method  of  moments,  in  a  manner  similar  to 

that  shown  in  section  II. 

fizxnf(x)dx  =  --w::*sns  *°  ^ 

The  limits  of  the  observed 

function  may  be  taken  from  -  03  to  +  00  as  the 

cubes  of  the  negative  numbers  remain  negative. 

First  moment.         £  - 

-  ^feal- 

<jS2n 

 «_ 

Let  x=x'+a  - 

3a<j2  *  a3 

Second  moment. 

i2  +  \L.9  = 

-fyft2(x+a)*e  &*dx 

a*  +  15a4 j2  +  45a V4  *  15a8. 

Third  moment. 

6Z  +  3$na  +  n9  = 

<7/2n 

a9  +  36a 7J 2  +  378a5J4  ♦  1260ascr«  +  945a<7*  . 

Eliminating    g             u„  = 

3a4J2  +  9a2J4  +  loo*6. 

H«  = 

945«?fl  +  99a3a«  +  351a5J4  +  27a7<72-3ana(3<J*+a2) 

Equations  of  this  sort  are 

of  too  high  a  degree  and  too  complicated  to  be 

of  U3e  in  making*  an  application  to  observed  distributions. 

Since  the  cubes  of  negative  numbers  are  negative  it  would,  perhaps, 

at  first  thought,  seem  that  the  position  of  the  origin  would  not  effect 

the  transformation.  This, 

however,  is  not  true.  When  jc=0  the  transformed 

12. 

.0x11 

curve  u  =  — -/Lr-/^r-  e  scr 

is  similar  in  seme  respects  to        ,  -% ■ 

-  (✓*)* 
y  =    -  } ,    r    20* 2 
2cx/2n* 

Both  of  these  curves  begin  at  co  when  x=0,  then  descend  very  rapidly  for 
a  time,  touching  the  x-axis  at  infinity.  The  fundamental  difference  be- 
tween the  two  cases  is  that  the  case  of  cubes  has  a  Dair  of  curves  begin- 
ning at  infinity  for  x-Q  and  extending  in  opposite  directions  and  inter- 
secting the  x-axis  at  -    oo  and  +  a>  •  Figure  no.  2  is  a  graph  of  this 
curve.  It  will  be  noted  that  in  the  case  of  sauares  there  is  a  single 
curve  similar  in  shape  to  this  curve  in  the  first  quadrant. 

It  is  necessary  that  a>oV3  in  order  that  we  may  have  a  maximum  or  a 
minimum  on  either  side  of  the  ^-axis.  This  function, as  well  as  the  func- 
tion discussed  in  section  II, has  an  infinite  discontinuity  at  the  origin. 
With  regard  to  this  infinite  discont inuity, it  does  not  seem  likely  that 
there  would  be  any  analogy  in  nature. 

Let  us  determine  the  maxima  and  minima  of 

_  1   .  2 

jfr  =  x-ie    2      E-(*f-«)Hhj     2  (-fr%o, 

xi  -  axi  +2  =0, 

For  a=cr  or  a=2r,  X  is  imaginary,  there  is,  therefore,  no  maxima,  mini- 
ma or  point  of  inflection.  See  fi£s.  no.  3  and  4.  For  a=3j,  X=l  or  3  ma- 
king a  minimum  at  1  and  a  maximum  at  6.  If  the  variates  of  the  elementary 
normal  curve  are  all  positive,  there  will  be  a  smooth  flat  curve,  similar 
in  form  to  the  normal  curve. 
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IV.- TRANSFORMATION  X  =  a,x+  a**? 


This  is  a  more  general  transformation  than  those  treated  in  sections 
II  and'  III.  There  is  a  certain  propriety  in  using  the  first  two  or  three 
terms- of  a  Taylor's  expansion.  In  this  section,- we  will  consider  only  the 
first  two  terms  of  the  expansion.  It  is  necessary  that  a'.*  shall  be  small 
as  compared  to  at  otherwise  the  curve  will  be  very  much  distorted  from  a 
normal  curve,  since  the  theoretical  distribution  is  here  calculated  from 
the  median  of  the  observed  distribution. 

It  is  equally  general  to  make<r=  or  j/2=l,  and  to  replace  by 
E,  thus  using  Edgewobth's  notation.  Since  by  hypothesis  a,z  is  small  com- 
pared to  at»  let  ^a       where  x  is  small.  Then 


and 
then 


X  =  a(E  +  xE2) 
dX  =  a(l  +  &l£M; 
dE    =  /e  4x 


r. 


2x 

We  shall  denote  the  moments  about  the  median  as  .¥,,  M P,  ¥9,  #A  respective- 
ly, and  jif,  lip,  fi,,,  fi4  denote  the  moments  about  the  center  of  gravity. 
First  moment.     M t  iflZa^Z  +  ^E'2)e~^2dE 

Second  moment.  M  >>  -wf^ccd  (B.  +  KE  )e  dh. 


Third  moment. 


2  4 

=  ?xa3  +  l?x3a8. 
4  d 


lit  =  ^ 


2  4 


=  |a'2(l  +  x*). 
Hg  =  If,  -  3tf,u*  -  f|  =  aV(|  +x'*). 
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B*  =  id  =  LfaLL±-«int  (i) 

Pl     a*     haHl  ♦  k'2)D3 
-  3  Vit  +  3k,s  +  k4} 

8 

From  (1)  h2(|  +  3k*  +  x4)  -  3,1(1  +  3**2  +  3k4  +  k«)  =  0. 
Let    x=  *2«  Then 

9x<S  ♦  3X  +  x2)  -  Ml  +  3X  *  3x2  +  x3)  =  0, 

4 

18x  +  24x2  +  8x3  -  St(l  +  3X  +3x'2  +  X3>  =  °* 
X  *  a(E  *  kE2), 

2k 


(2) 


(3). 


Since  £=         it  is  highly  improbable  for  |E|  to  be  greater  than  o.  In 
cr/2 

fact  we  shall  prove  presently,  that  negative  values  of  E  lie  between  0  and 
-5.  This  transformation  will  be  applied  to  the  data  used  in  section  II. 
To  do  this  it  is  necessary  to  get  first  the  values  of  a  and  k. 

Prom  (1)      3  =  Hf  =  4|Qp|Q§mg8  =  0.707005. 
w      u.|  6368280.078 

7.292995x3  +  21.37898&X"2  +  15.378985x  -  0.707005  =  0, 


3  + 


3X2  +  2.177292x  -  0.096943  =  0, 
X  =  0.0413. 


*2  =  X* 
k  =  +  0.2044504. 

&z  =  |a'*(l  +  *'2), 
a'2  =  364.9177335, 
a  =  +  19.1028. 

In  the  following  table  III,  X  as  given  in  column  I  is  calculated 
from  the  median,  E  of  column  II  is  calculated  by  substituting  the  values 
of  X,  a,  and  k  into  equation  (3),  <p(E)  of  column  III  is  taken  from  Czu- 

Phil.   Trans.   Roy.   Soc.    1895,   vol.    186,    p.  351. 
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bar*.  <p(E)x2Q22  is  found  in  column  IV.  The  calculated  theoretical  fre- 
2 

quency  is  given  in  column  V,  the  observed  frequency^ in  column  VI,  and  the 
residuals.,  in  column  VII. 

TABLE  III. 
Above  the  median. 


I. 

II. 

III. 

IV. 

V. 

VI. 

VII. 

3.7037 

0.137014 

0.208601 

530 

729 

317 

90 

8.7037 

0.419821 

0.44732 

1136 

608 

800 

6 

13.7037 

0.635218 

0.83128 

1604 

463 

513 

45 

13.7087 

0.836317 

0.76308 

1939 

335 

315 

20 

23.7037 

1.022475 

0.851733 

2164 

225 

203 

17 

23.7037 

1.20572 

0.91181 

2317 

153 

108 

45 

33.7037 

1.376364 

0.94346 

2410 

93 

75 

18 

33.7037 

1.540341 

0.97063 

2466 

56 

33 

13 

43.7087 

1.693141 

0.93368 

2499 

33 

23 

5 

48.7087 

1.349278 

0.99106 

2518 

19 

18 

1 

53.7037 

1.996373 

0.99522 

2529 

11 

17 

6 

58.7037 

2.138173 

0.99748 

2534 

5 

5 

0 

33.7087 

2.275739 

0.993705 

2533 

4 

4 

0 

68.7037 

2.409373 

0.999343 

2539 

1+ 

2 

1 

73.7037 

2.539935 

0.999670 

2540 

1 

1 

0 

78.7087 

2.86654 

0.999337 

2541.- 

1- 

2 

1 

33.7037 

2.78974 

0.9999199 

2541.- 

0 

0 

0 

88.7087 

0 

0 

0 

93.7087 

0 

1 

1 

Below  the 

median. 

1.2913 

0.06357 

0.077827 

197 

6.2913 

0.355288 

0.384663 

977 

730 

746 

34 

11.2913 

0.687309 

0.86926 

1700 

723 

724 

1 

*  Kahrsoheinliohkeitsrechnung,   page  385. 
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I. 

II. 

III. 

IV. 

V. 

VI. 

VII. 

16.2913 

1.10381 

0.38029 

2237 

537 

483 

49 

21.2913 

1.718048 

0.984883 

2503 

286 

248 

16 

28.2913 

imaginary 

94 

94 

31.2913 

If 

26 

28 

33.2913 

n 

2 

2 

41.2913 

n 

1 

1 

48.2913 

n 

. . 

. . 

51.2913 

ft 

1 

1 

This  function  fits  the  observed  distribution  better  than  the  normal 
frequency  function  or  the  transformation  discussed  in  section  II  for  the 
portion  of  the  range  which  gives  real  values.  For  the  positive  end  of  the 
range  and  the  first  four  frequency  groups  of  the  negative  range,  this 
function  describes  the  observed  distribution  fairly  well.  Prom  the  fourth 
term  on, the  terms  on  the  negative  side  are  imaginary.  Professor  Edgeworth 
states*  "that  the  difficulty  from  occurence  of  imaginary  values  is  not 
apprehended".  In  this  example  imaginary  values  of  a  and  x  do  not  occur, 
but  there  is  difficulty  from  the  values  of  E  becoming  imaginary. 

Here  another  problem  represents  itself,  namely  within  what  interval 

must  E  remain  in  order  that  all  negative  values  of  E  will  remain  in  the 

negative  region  for  the  transformed  function  X, 

X  =  19.1028E  +  3.8275E2 

but  E  =  -£  where  t  i3  positive,  fihen  is  -19.1023*  +  3.3275^  0  ? 

3.3275t*  <  19.1028t  , 
t  <  o. 

Therefore  E  must  lie  within  the  interval  0  <  E  <  -5  so  that  any  negative 
value  of  E  will  give  a  negative  X, 

*     International  Congress  of  Mathematicians, 1912, vol. II,   p. 431. 
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Pearson's  criterion  of  best  fit*  applied  for  real  values  of  E  gives 
X'a:s44.5  and  the  probability  p=0.0053  that  deviations  as  great  as  or  great- 
er than  these  would  occur  under  random  sampling,  where 

squares  of  differences  of  theoretical  and  observed 

a  =   ,  ,  .  frequencies.  .  ) 

^  (  theoretical  frequencies  ) 

Another  transformation  which  might  be  of  interest  for  some  special 

cases  is  to  replace  the  variable  x  by  X  where  X-x^-x2,  {l+2x)dx-dXt  dx  = 

fe    **"  ax  »  /  —W-e  dX 

It  is  necessary  that  the  median  of  the  normal  curve  should  be  suffi- 
ciently far  above  the  origin  so  that  all  variates  be  positive. 

Replacing  each  x  by  X  and  solving  for  the  constants  by  method  of  mo- 

First  moment.  _  Isral2. 

$  =  — i.r+fic(jc+x2)e     2j'2  dx  . 

_„£ 

=  .^/+*[(;c'+a)  +(*'+aK)e  8<rtd*, 

g  =  (a  +  a2)  +  tjs.  I. 
Second  moment,  9 


cv2n  2  4 


=  3j4  +  J*$(6a2  +  8<*  +  1)  +  a4  +  2a 8  +  a2  II. 


Third  moment. 


*  Biometrlka,   vol.    I,    pp.  155-181. 


locr  «+5iT4  ( 15a  *+15a+3  )+j  2(  15a  4+30a  3+18a  2+3a  )+a  ^a'+Sa'+d9 

III. 
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Substituting  tha  value  of  g  from  equation  I    into  equation  (2)  and  (3) 

u,,  =  2a4  +  <j2(4a2  +  4a  *  1)  (4) 

H3  =  llcr*  +  (9a2  +  9a  +  2) -3a4  (5) 

=  lid*  +  2(4a2  +  4a  +  l)-3a4  +  (a2  +  a).3r4. 

Prom  equation  (4) 

4a2  +  4a  +  1  =  tU.r-&.t  (6) 

(7  2 

Substituting  the  value  of  the  left  hand  member  of  (6)  in  (5) 

a2  4crz  4 

4u9  +  10a*  +  3cr4  -  27<r2uP=  0 

Putting  cr2  »  f , 

10/"  ♦  3f2  -  27ru?  +  4fi„  =  0, 
f9  +  0.3f2  -  2.7fu*  *.4u3  =  0. 
Sinoe  the  equation  in  a  is  one  of  the  second  degree  the  solution  of  the 
two  equations  is  facilitated  by  substituting  the  value  of  4a2+4a+l  from 
equation  (8)  into  equation  (5). 

V.  TRANSFORMATION  X  =  a  t£  +  a^2  +  a,53. 

This  type  of  transformation  is  more  general,  perhaps,  and  will  change 
the  general  form  of  the  curve.  While  E2  tends  to  pile  the  variates  near 
the  origin  and  in  the  negative  region  E3  has  a  more  flattening  and  dis- 
tributing effect. 

If  y=f(X),  then  where  f(Z)  is  the  normal  function  with  the  standard 
deviation  a=-i  or  j/2=l  and  X=a (£+k£2+XH3 ), then  the  nth  moment  about  the 
median  is 

Cxf(x)dX  =  Can(B+KZ2+\£3)ne-Z2dB. 
The  required  constants  a,  k  and  \  can  be  solved  for  by  method  of  moments. 


19. 


The  moments  are  calculated  from  the  median.  Por  convenience  we  shall  U3e 
Edgeworth's  notation*.  M t,#5>etc. ,  represent  the  moments  about  the  median. 
Mx  ■  --flail  +  *£2  +  AH3)e"g2dH 

=  |,K. 

Jft  =  -i/+°a2(E.  *  xE2  +  \Z3fe-Z2dZ 

=  a*(l  +  2k*  +  2x  +  i|x«), 
2     4         2  8 

f.  =    4/+eV»(5  +  tsf*  +  A53)|"g2dE 


Jft   =   |tZK.  (1) 

-  H  *  «£(.!  +  k  2  +  3A  +  i|A2)  (2) 
if»  -  -  **  =  aM§  +  k 2  +  9A  +  *§§A2)  (S) 


Since  3 t  =       and  n. **  =  if  -  3  (4) 

we  may  solve  for  the  constants    a,K  ,  A. 

3    -  li-  2x2J  +  **  +  SA  +  135A2)2 
Pl      ui        (1  +  k2  +  3A  +  HA2)3 
'  *  4 

tt,            4(aK  2+3A+3K4+54K8A+27A2  +135*  2A  2+-r2A3+1i15A4 
r)  =       -  3  =   -   *  8  — 

I1 1  (l  +  k'2  +  3A  +  12A*)2 

4 

We  may  use  x=K** 

8>c(S  +  x  +  9A  +  l|2x'2)-*  -  £(1  +  x  +  3A  +  =  0  (5) 

2  8  4 

4(6x+3A+3x2+54xA+27A2+135xA-2+^2\3+12iQ^  _  ^ (1+x+3A+l2A2)2  a  0  (Q) 

These  equations  are  not  always  so  difficult  to  solve  as  they  seem.  This 

is  due  to  the  fact  that  k  and  A  are  usually  small  fractions.  The  higher 

powers  will, therefore, become  small.  In  trying  out  the  numerical  data  used 

in  sections  II  and  IV, I  made  a  first  approximation, using  only  the  first 

*  Fifth  International  Congress  of  Mathematicians,   Proceedings,    vol. II, 

1912,    p.  430. 
**  The  n  ~  P ?~3  in  Pearson's  notation. 
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powers  of  x  and  \.  Then  from  equations  (5)  and  (6)  we  get: 

lfy  -  0.707005(1  +  3x  +  9A)  =  0  (7) 
and  24x  +  12fc  -  1.833691(1  +  2)(  +  8\  )  =  0  (3). 

Solving  for*  x  an<*  * 

X  =  0.1015, 
X  =  0.0352. 

Here  A  and  x  seem  too  large,  so  a  second  approximation  was  made.  This  was 
done  by  substituting  for  each  A  and  x  in  equations  (5)  and  (6)  the  value 
of  A  and  x  plus  some  new  A  and  x  say  A1  and  xT»  Thus  A  =0.1015+ A',  and  x  = 
0.0352+xf.  Only  the  first  powers  of  % 1  and  A'  were  retained  because  of 
the  difficulty  of  solution  of  higher  powers. 

X1  =  -  0.030603  , 
A'  =  -  0.037525  . 
Then  X    =  X'  +  0.03529 

X     =  0.054069 
A  =  A  T  +  0.101529 
A  =  0.07303. 
Since  x=<2,  K  3  t  0.233679. 

These  values  of  x  and  A  are  substituted  in  equation  (2)  and  a  is 
solved  for. 

a2  =  293.3694 
a  =  +  17.1425. 

These  constants  have  been  determined  in  a  more  precise  way  than  in  section 
IV.  Because  of  the  immense  amount  of  labor  that  would  be  involved  in  solv- 
ing a  cubic  equation  by  Horner Ts  method  for  each  value  of  ^  -  and  in 
choosing  the  root  appropriate  for  the  transformation  and  because  x,A  ,.. 


are  a  decreasing  sequence,  that  is  A  is  smaller  than  k,  it  was  thought 
best  to  retain  only  the  first  and  second  powers  of  £. 
From  k  H*) 

-i  vITIfl 

E  =  =  *»-  • 

In  table  IV  the  theoretical  frequency  of  this  sort  of  transformation  is 

given.  Column  I  gives  the  value  of  X  as  calculated  from  the  median;  col. 

II  gives  £  calculated  from  a,  k,  X  where  B  -  — ~ — *-  ;  col.  Ill,  tp(£); 

2k 

col.  IV,  q>(H)x2Q|S;  col.  V  the  theoretical  frequency;  col.  VI  the  observed 
frequency;  col.  VII  the  residuals. 

TABLE  IV. 
Above  the  median. 


I. 

II. 

III. 

IV. 

V. 

VI. 

VII. 

3.7037 

0.20637 

0.22956 

583 

831 

317 

14 

8.7087 

0.45898 

0.48366 

1229 

646 

600 

46 

13.7087 

0.88844 

0.66970 

1702 

473 

513 

40 

18.7087 

0.902113 

0.79745 

2026 

324 

315 

o 

23.7087 

1.09965 

0.38009 

2238 

210 

208 

2 

28.7087 

1.28715 

0.93133 

2367 

131 

108 

23 

33.7087 

1.46496 

0.961696 

2444 

77 

75 

2 

38.7037 

1.63498 

0.979232 

2488 

44 

33 

6 

43.7087 

1.79579 

0.989079 

2513 

25 

28 

3 

43.7087 

1.95321 

0.9942523 

2526 

13 

18 

5 

53.7087 

2.101108 

0.9970339 

2533 

7 

17 

10 

53.7087 

2.24588 

0.998608 

2537 

4 

5 

1 

33.7087 

2.38584 

0.9992582 

2539 

2 

4 

2 

68.7087 

2.52658 

0.9996473 

2540 

1 

2 

1 

73.7037 

2.65404 

0.999315 

2541 

1 

1 

0 

73.7037 

2.907407 

0.99993 

2541 

0 

2 

2 

33.7037 

.3.02934 

0.9999809 

.2541 

0 

0 

0 
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I. 

II. 

III. 

IV. 

V. 

VI. 

VII 

88.7067 

3.14875 

0.9999915 

2541 

0 

0 

0 

93.7087 

3.26559 

0.999998 

2541 

0 

1 

0 

Below  the 

median. 

A  »  C  v  i.  O 

0  076707 

0  0^7577 

248 

8   901  S 

0  401854 

0  43018 

845 

748 

00 

11  ?01Q 

0.81323 

0.74992 

1005 

812 

*J  X  <l 

724 

AO. 45 AO 

1.424583 

0.956059 

2429 

524 

488 

38 

Ow 

21.2913 

ima£  inarv 

248 

248 

26.2913 

ff 

r\  a 

y4 

y4 

31.2913 

II 

26 

26 

36.2913 

II 

2 

2 

41.2913 

1? 

1 

1 

48.2913 

H 

0 

0 

51.2913 

II 

1 

1 

The  upper  half  of  this  transformed  curve  describes  the  given  distri- 
bution very  well.  There  is,  however,  absolutely  no  fit  below  the  centroid, 
the  values  becoming  imaginary  after  the  third  frequency  group  below  the 
origin.  It  is  to  be  observed  that  the  negative  values  appear  in  this 
transformation  sooner  than  in  the  other  transformations.  This  is  due  to 
the  fact  that  a  is  not  large  enough  as  compared  with  with  x. 

Applying  Pearson's  criterion  of  fit  to  the  part  above  the  median,  we 
find  that 

X*  =  22.47 

and  the  probability  that  deviations  as  great  as  these  would  occur  under 
random  sampling  is 

P  =  0.055  . 

In  conclusion,  it  may  be  fitting  to  say  that  an  indefinite  number  of 
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transformations  are  possible.  The  difficulty  is  in  selecting,  a  priori, 
the  appropriate  transformation  for  the  particular  data  given.  The  trans- 
formed functions  considered  in  this  paper  have  advantages  in  being  capable 
of  application  to  data  which  in  general  appearance  deviates  from  the  norm- 
al   curve.  This  fact  is  shown  by  the  graphs  of  the  transformed  function 
(figs.  1-3). 

The  normal  function  does  not  describe  the  data  considered  in  this 
paper.    While  the  transformed  function  discussed  in  section  II  is  better 
for  part  of  the  data,  there  are  other  parts  which  this  function  does  not 
describe. 

The  Pearsonian  criterion  of  fit  could  not  be  applied  as  the  tables 
do  not  extend  far  enough,  in  other  words  there  is  little  probability  that 
such  great  deviation  is  due  to  random  sampling.  If  the  part  of  the  theo- 
retical distribution  above  the  median  is  considered  by  itself  the  probabil 
ity  that  such  deviation  is  due  to  random  sampling  is  P  =  0.000001.  Thus 
even  the  part  above  the  median  is  a  very  bad  fit. 

The  function  considered  in  section  IV  gives  a  much  better  description 
except  where  the  imaginary  numbers  appear. 

The  theoretical  distribution  of  section  V  describes  the  part  above 
the  median  very  well,  as  was  found  by  application  of  Pearson's  criterion 
of  fit.  Below  the  median  there  are  only  three  real  values,  the  remaining 
values  being  imaginary. 

The  disappearance  of  the  difficulty  of  imaginary  values  here  obtained 
is  to  be  expected  when  k  is  smaller  compared  to  a  than  it  is  in  this  prob- 
lem. This  condition  is  doubtless  realized  for  cases  which  deviate  but 
slightly  from  the  normal.  It  seems  difficult  to  determine,,  a  priori,  other 
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conditions  under  which  x.2,2  is  sufficiently  small  so  that  we  entirely  avoid 
the  occurence  of  imaginary  values  of  £  to  correspond  to  numerically  large 
negative  values  of  X. 


